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Abstract. - Information overload in the modern society calls for highly efficient recommendation 
algorithms. In this letter we present a novel diffusion based recommendation model, with users' 
ratings built into a transition matrix. To speed up computation we introduce Green function 
method. The numerical tests on a benchmark database show that our prediction is superior to 
the standard recommendation methods. 



Introduction. — The exponential growth of the In- 
ternet [1] and the World- Wide- Web [2] confronts us with 
the information overload: we face too many data and data 
sources, making us unable to find the relevant results. As 
a consequence we need automated ways to deal with the 
data. Recently, a lot of work has been done in this field. 
The two main directions of the research are correlation- 
based methods [3,4] and spectral methods [5]. A good 
overview of the achieved results can be found in [6,7]. 

Despite the amount of work done, the problem is not 
satisfactorily exploited yet as both the prediction accu- 
racy and the computational complexity can be improved 
further. In this letter we propose a new method based 
on diffusion of the users' opinions in an object-to-object 
network. This method can be used for any data where 
users evaluate objects on an integer scale. Using data 
from a real recommender application (GroupLens project) 
we show that the present model performs better then the 
standard recommendation methods. In addition, a Green 
function method is proposed here to further reduce com- 
putation in some cases. 

The model. — In the input data, the total number 
of users we label as M and the total number of objects 
as A'^ (since we focus here on the movie recommendation, 
instead of the general term object we often use the term 
movie). To make a better distinction between these two 
groups, for user-related indices we use lower case letters 
i,j,k,... and for movie-related indices we use Greek let- 
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ters a, /3, 7, . . . . We assume that users' assessments are 
given in the integer scale from 1 (very bad) to 5 (very 
good). The rating of user i for movie a we denote Via. 
The number of movies rated by user i we label fc^. The rat- 
ing data can be described by the weighted bipartite graph 
where the link between user i and movie a is formed when 
user i has already rated movie a and the link weight is 
Via- Such a bipartite graph can give rise to two different 
types of graphs (often called projections): object-to-object 
and user-to-uscr. A general discussion on information net- 
works can be found in [8], projections of bipartite graphs 
are closely investigated in [9, 10]. 

The recommendation process starts with preparation of 
a particular object-to-object projection of the input data. 
Projections usually lead to a loss of information. In order 
to eliminate this phenomenon, instead of merely creating 
a link between two movies, we link the ratings given to 
this pair of movies. As a result we obtain 25 separate 
connections (channels) for each movie pair. This is illus- 
trated in fig. 1 on an example of a user who has rated 
three movies; as a result, three links are created between 
the given movies. When we process data from all users, 
contributions from all users shall accumulate to obtain an 
aggregate representation of the input data: a weighted 
movie-to-movie network. From the methodological point 
of view, this model is similar to the well-known Quantum 
Diffusion process (see [11,12]). 

To each user we need to assign a weight. In general, 
if user i has rated ki movies, ki{ki — l)/2 links in the 
network are created (or fortified). If we set the user weight 
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Fig. 1: Graphical representation of the links created by a user 
who has rated only movies 1 (rating 5), 2 (rating 3), and 3 
(rating 4). 



to l/{ki — 1), the total contribution of user i is directly 
proportional to fcj, and this is a plausible premise.^ Since 
the users who have seen only one movie add no links to 
the movie-to-movie network, the divergence of the weight 
l/{ki — 1) at fci = 1 is not an obstacle. 

Since between each pair of movies {a, (3) we create mul- 
tiple links, it is convenient to write their weights as a 
5x5 matrix Wc,^. Each rating can be represented by 
a column vector in 5-dimensional space: rating = 1 
we represent as Via = (1,0,0,0,0)^, rating 
Via = (0, 1, 0, 0, 0)^, and so forth. If the vote has not been 
given yet, we set Via = (0, 0, 0, 0, 0)^. Then using the link- 
ing scheme from fig. 1 and the user weights — 1) we 
write 
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where we sum contributions from all users. In this way 
we convert the original data represented by a weighted 
bipartite graph into a weighted object-to-object network. 

The non- normalized weights \Na/3 form a symmetric ma- 
trix W with dimensions 5A'' x 5A''. By the column normal- 
ization of W we obtain an unsymmetric matrix ft. It de- 
scribes a diffusion process on the underlying network with 
the outgoing weights from any node in the graph normal- 
ized to unity (see also a similar diffusion-like process in [14] 
and the PageRank algorithm^). 

Now we shall investigate the equation 



f2/i = h, 



(2) 



where h is a SA'^-dimensional vector (the first 5 elements 
correspond to movie 1, next 5 elements to movie 2, etc.). 
Denote Has {en = 1, • • • , M, s = 1, . . . , 5) the number of 



^Here one can recall the famous set of equations for PageRank 
G{i) of webpage i. It has the form G{i) = a + (1 — a) > 
where the subscript j runs over all the webpages that contain a link 
to webpage i {j ~ i), for details see [13]. Here a similaj: scaling 
of the contributions by the inverse of the node degree arises. By 
a numerical solution of the set, one obtains values G(i) which are 
essential for the Google search algorithm. 

^Incidentally, PageRank algorithm normalizes the flux outgoing 
from a node in a similar way and thus it also represents diffusion or 
a random walk. If one chooses the row normalization instead, the 
resulting process is equivalent to heat conduction in the network. 



times movie a has been rated with the rating s. Here we 
exclude the votes given by the users who have rated only 
one movie because these users do not contribute to fl. It 
is easy to prove that the vector 



h* = (nil. 
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is a solution of eq. (2). Moreover, the solution is unique up 
to multiplication by a constant and as we will see later, all 
vectors in the form Xh, A 7^ 0, lead to identical predictions. 
Denote L := 1 — the Laplace matrix, the forementioned 
uniqueness of h* is equivalent to rank(L) = 5 A'' — 1, which 
we prove in the following paragraph. It is worthwhile to 
emphasize that the unique solution h* reproduces some 
features of the original input data, which strongly supports 
rationality and relevance of the construction of Q. 

Using elementary row/column operations one can shift 
all the rows/columns corresponding to the zero-rows/zero- 
columns of to the bottom and right of L, leading to 
( O ? ) ' ■^h^re O and 1 are the zero and the identity ma- 
trix. The dimension of 1 we label as D, the dimension of L' 
is then 5N — D. The matrix L' has four properties: (i) All 
its diagonal elements are 1. (ii) All its non-diagonal ele- 
ments lie in the range [—1,0]. (iii) The sum of each column 
is zero, (iv) In each row, there is at least one non-diagonal 
nonzero element. One can prove that the rank of any ma- 
trix with these four properties is equal to its dimension 
minus one, 5N — D — 1 in this case. Since rank(l) = D, 
together we have rank(L) = rank(L') -t- rank(l) = 5N — 1. 
Details of the proof will be shown in an extended paper. 

The matrix Q codes the connectivities between different 
ratings in the movie-to-movie network, and could yield to 
a recommendation for a particular user. Since the matrix 
represents only the aggregated information, in order to 
recommend for a particular user, we need to utilize opin- 
ions expressed by this user. We do so by imposing these 
ratings as fixed elements of h in eq. (2). These fixed el- 
ements can be considered as a boundary condition of the 
given diffusion process; they influence our expectations on 
unexpressed ratings. In other words, large weights in O 
represent strong patterns in user ratings {e.g. most of 
those who rated movie X with 5 gave 3 to movie Y) and 
diffusion of the ratings expressed by a particular user in 
the movie-to-movie network makes use of these patterns. 

The discussion above leads us to the equation 



^ihi — hi 



(4) 



where ilj := Q for the rows corresponding to the movies 
unrated by user i and f2j := 1 for the remaining rows. 
Such a definition keeps entries corresponding to the movies 
rated by user i preserved. The solution of eq. (4) can be 
numerically obtained in a simple iterative way. We start 
with h!f^ where elements corresponding to the movies 
rated by user i are set according to these ratings and the 
remaining elements are set to zero. Then by the iteration 
equation h^[^^^^ = Q,ih^[^^ we propagate already expressed 
opinions of user i over the network, eventually leading 
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to the stationary solution hi. Intermediate results h}"'' 
contain information about the movies unrated by user i, 
which can give rise to a recommendation. We obtain the 
rating prediction as the standard weighted average. For 
example, if for a given movie in hi we obtain the 5-tuple 
(0.1,0.2,0.4,0.3,0.0)'^, the rating prediction is v = 2.9. 
Notice that if a user has rated no movies, we have to use 
a different method (for example the movie average intro- 
duced later) to make a prediction. This feature is common 
for recommender systems producing personalized predic- 
tions. 

Avoiding the iterations. — While simple, the itera- 
tive way to solve eq. (4) has one important drawback: the 
iterations have to be made for every user separately. Con- 
sequently, the computational complexity of the algorithm 
is high. To get rid of this difficulty we rewrite eq. (4) as 
Lhi ~ ji, again L = 1 — fl. Here the external flux jj is 
nonzero only for the elements representing the boundary 
condition of user i. 

The solution hi can be formally written in the form 
hi = Gji. This resembles the well-known Green function 
approach: once G is known, hi can be found by a simple 
matrix multiplication. While the source term ji is not 
a priori known, we can get rid of it by reshuffling of the 
movies and grouping the boundary elements in hi. After 
this formal manipulation we obtain 

where B stands for boundary and F for free. Now it follows 
that hf = GsBjf and hf = Gpeif , leading us to the final 
result 

hj = GFBGe^/if . (6) 

Since most users have rated only a small part of all M 
movies, the dimension of Gbb is usually much smaller than 
that of G and thus the inversion Ggg is cheap. 

The last missing point is that since L is singular (as 
we have mentioned, rank(L) = 5N — 1), the form of G 
can not be obtained by inverting L. Hence we use the 
Moore-Penrose pseudoinverse [15] 

G = = lim [l + Q + ^f + + - kwnwi] , (7) 

fc— >oo 

where idr and wj^ is the right and left eigenvector of $1 
respectively, both corresponding to the eigenvalue 1. For 
practical purposes, the infinite summation in eq. (7) can 
be truncated at a finite value k. 

Personal polarization. — Before the described 
method can be used in real life examples, there is one im- 
portant technical problem. Each user has a different style 
of rating — some people tend to be very strict and on aver- 
age give low marks, some people prefer to give either 1 or 
5, some don't like to give low marks, and so forth. Thus, 
ratings cannot be grouped together in matrices W^/j in 



the straightforward and naive way we described before for 
they mean different things to different people. 

To deal with this phenomenon, which we refer to as 
personal polarization, unification of ratings from different 
users is used before summing users' contributions in the 
object-to-object network. Consequently, before reporting 
resulting predictions to a user, the output of the algorithm 
has to be shifted back to the user's scale and personaliza- 
tion is needed. 

To characterize the rating profile of user i we use the 
mean fii and the standard deviation ai of the votes given 
by him, and we compare these values with the mean rui 
and the standard deviation Sj of the ratings given by all 
users. Notably, the quantities nii and Si take into account 
only the movies rated by user i — if a user has a low average 
rating because he has been rating only bad movies, there 
is no need to manipulate his ratings. To conform a user 
rating profile to the society rating profile we use the linear 
transformation 

S ' 

Uia = rrii + {Via - l^i) — ■ (8) 

Personalization of the predicted value is done by the in- 
verse formula Vioc = ^i ~\~ {uicx — 

mi)ai/si. We can notice 
that while Via is an integer value, Uia is a real number. 
Nevetheless, one can obtain its vector representation in 
the straightforward way: e.g. u = 3.7 is modelled by the 
vector (0, 0, 0.3, 0.7, 0)-^; the weighted mean corresponding 
to this vector is equal to the input value 3.7. 

Benchmark methods. — In correlation-based meth- 
ods, rating correlations between users are quantified and 
utilized to obtain predictions. We present here one imple- 
mentation of such a method, which serves as a benchmark 
for the proposed diffusion model. The correlation Cij be- 
tween users i and j is calculated with Pearson's formula 

where we sum over all movies rated by both i and j (to 
remind this, there is a star added to the summation sym- 
bols); Cij := when users i and j have no movies in com- 
mon. Due to the data sparsity, the number of user pairs 
with zero correlation can be high and the resulting predic- 
tion performance poor. To deal with this effect, in [16] it 
is suggested to replace the zero correlations by the society 
average of Cij. In the numerical tests presented in this 
Letter the resulting improvement was small and thus we 
use eq. (9) in its original form. Finally, the predictions are 
obtained using the formula 




Here we sum over the users who have rated movie a (prime 

symbols added to sums are used to indicate this), the term 
Efc serves as a normalization factor. 
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As a second benchmark method we use recommenda- 
tion by the movie average (MA) where one has Via = ma, 
rua is the average rating of movie a. This method is not 
personahzed (for a given object, all users obtain the same 
prediction) and has an inferior performance. As it is very 
fast and easy to implement, it is still widely used. Notably, 
when unification-personalization scheme is employed to- 
gether with MA, the predictions get personalized. As we 
will see later, in this way the prediction performance is 
increased considerably without a notable impact on the 
computation complexity. 

Numerical results. — To test the proposed method 
based on opinion diffusion (OD) wc use the GroupLcns 
project data, available at www . grouplens . org. The total 
number of users is M = 943, the total number of movies 
is = 1 682, and the ratings are integer values from 1 to 
5. The number of given ratings is 100000, corresponding 
to the voting matrix sparsity around 6%. 

To test the described methods, randomly selected 10% 
of the available data is transfered to the probe file P, and 
the remaining 90% is used as an input data for the rec- 
ommendation. Then we make a prediction for all entries 
contained in the probe and measure the difference between 
the predicted value Via and the actual value Via- For an 
aggregate review of the prediction performance we use two 
common quantities: root mean square error (RMSE) and 
mean absolute error (MAE). They are defined as 



MAE 



RMSE 



EWia '^ia\^ 

r 

n. 



1/2 



(11a) 



(lib) 



where the summations go over all user-movie pairs (z, a) 
included in the probe V and n is the number of these 
pairs in each probe dataset. To obtain a better statistics, 
the described procedure can be repeated many times with 
different selections of the probe data. We used 10 repeti- 
tions and in addition to the averages of MAE and RMSE 
we found also standard deviations of both quantities. 

In contrast with the expectations, in fig. 2 it can be seen 
that the prediction performance is getting worse by a small 
amount when more than one iteration of eq. (4) is used to 
obtain the prediction. Probably this is due to the presence 
of ovcrfitting — starting from the second iteration, our ex- 
pectations are influenced not only by actually expressed 
ratings but also by our expectations about unexpressed 
ratings obtained in previous iteration steps. Nevertheless, 
as it will be shown later, the performance achieved by the 
first iteration is good and justifies validity of the proposed 
model. In the following paragraphs we use only one iter- 
ation to obtain the predictions. Consequently, the Green 
function method introduced above is not necessary — we 
decided to expose it in this paper because it can be useful 
with other datasets. 
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Fig. 2; Prediction performance for the predictions Via obtained 
by iterations of eq. (4) using various numbers of iterations 
steps. 



In table 1 we compare the prediction accuracy for 

the movie- average method (MA), the correlation-based 
method (CB), and for the opinion diffusion (OD). To 
measure the prediction performances we use both RMSE 
and MAE as defined above. All three methods are 
tested both with and without employing the unification- 
personalization scheme. In accordance with expectations, 
for MA and OD the performances with unification in- 
cluded are better than without it; for the simplest tested 
method, MA, the difference is particularly remarkable. By 
contrast, CB is little sensitive to the unification procedure 
and when we drop the multiplication by Oijsi from the 
unification-personalization process given by eq. (8), the 
difference disappears completely (which can be also con- 
firmed analytically). According to the prediction perfor- 
mances shown in table 1 we can conclude that the diffusion 
method outperforms the other two clearly in all tested 
cases (RMSE/MAE, with/without unification). When 
computation complexity is taken into account, it can be 
shown that if M > iV, the proposed method is more effec- 
tive than correlation-based methods (but, of course, less 
effective than using the movie average). 

Conclusion. — We have proposed a novel recommen- 
dation method based on diffusion of opinions expressed 
by a user over the object-to-object network. Since the 
rating polarization effect is present, we have suggested the 
unification-personalization approach as an additional layer 
of the recommender system. To allow a computation re- 
duction with some datasets, Green function method has 
been introduced. The proposed method has been com- 
pared with two standard recommendation algorithms and 
it has achieved consistently better results. Notably, it 
is executable even for the large dataset (17770 movies, 
480 189 users) released by Netflix (a DVD rental company, 
see www.netflixprize.com). In addition, our model is 
tune-free in essence — it does not require extensive testing 
and optimization to produce a high-quality output. This 
is a good news for practitioners. 
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Table 1: Comparison of the three recommendation meth- 
ods: movie average (MA), correlation-based method (CB), and 

opinion diflFusion (OD). Presented values arc averages obtained 
using 10 different probes; standard deviations are approxi- 
mately 0.01 in all investigated cases. 





no unification 


with unification 


method 


RMSE MAE 


RMSE MAE 


MA 


1.18 0.91 


1.01 0.79 


CB 


1.09 0.86 


1.09 0.86 


OD 


1.00 0.80 


o.o;-! 0.73 
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